Project Description¶

Sleep is very important for both physical and mental health. Good quality sleep helps the body repair cells, improves memory, protects against illness, and supports overall well-being. In this project, we will act as data science consultants for SleepInc, a startup that makes a sleep tracking app called SleepScope. Our goal is to analyze anonymous sleep data collected by the app to find out how different lifestyle factors affect sleep quality and sleep duration.

The data comes from SleepScope, which gathers information from users about their sleep and daily habits. As data scientists, we will use Python to explore this lifestyle survey data. We want to find relationships between exercise, gender, occupation, and sleep quality. By analyzing the data, we hope to discover patterns that explain why some people sleep better than others.

The data: sleep_health_data.csv¶

We have a dataset containing anonymized sleep and lifestyle data for 374 people. The data shows average values for each person based on their activity over the last six months. This file is named sleep_health_data.csv.

The dataset has 13 columns. These columns cover different areas such as how long people sleep, how good their sleep is, whether they have sleep disorders, how much they exercise, their stress levels, diet, age, and other health and demographic details. The information will help us understand how these factors connect to sleep quality.

Column Description
Person ID A unique number that identifies each person in the dataset
Gender The gender of the person, either Male or Female
Age The age of the person in years
Occupation The type of job or profession the person has
Sleep Duration (hours) The average number of hours the person sleeps each day
Quality of Sleep (scale: 1-10) A score from 1 to 10 given by the person that shows how good they think their sleep is
Physical Activity Level (minutes/day) How many minutes per day the person spends doing physical exercise
Stress Level (scale: 1-10) A score from 1 to 10 showing how much stress the person feels on average
BMI Category The body mass index category of the person, such as Underweight, Normal, or Overweight
Blood Pressure (systolic/diastolic) The average blood pressure reading shown as systolic over diastolic pressure
Heart Rate (bpm) The average resting heart rate measured in beats per minute
Daily Steps The average number of steps the person takes each day
Sleep Disorder Whether the person has a sleep disorder, such as None, Insomnia, or Sleep Apnea

This dataset will allow us to explore how different lifestyle and health factors relate to sleep quality and duration. By analyzing this data, we hope to provide useful insights that could help SleepInc improve their app and help users sleep better.

Which occupation has the lowest average sleep duration?¶

In [5]:
# import required library
import pandas as pd

# load the data
sleep_df = pd.read_csv('sleep_health_data.csv')

# Groupby occupation and calculate mean sleep duration 
sleep_duration = sleep_df.groupby('Occupation')['Sleep Duration'].mean()

# Get occupation with lowest average sleep duration
lowest_sleep = sleep_duration.sort_values().index[0]

print(f"The occupation *{lowest_sleep}* has the lowest average sleep duration.")
The occupation *Sales Representative* has the lowest average sleep duration.

Which occupation has the lowest average sleep quality?¶

In [7]:
# Groupby occupation and calculate average sleep quality
sleep_quality = sleep_df.groupby('Occupation')['Quality of Sleep'].mean()  

# Get occupation with lowest average sleep quality 
lowest_sleep_quality = sleep_quality.sort_values().index[0]

print(f"The occupation *{lowest_sleep_quality}* has the lowest average sleep quality.")
The occupation *Sales Representative* has the lowest average sleep quality.

Explore how BMI Category can affect sleep disorder rates. Find what ratio of app users in each BMI Category have been diagnosed with Insomnia.¶

In [15]:
# Filter the full dataframe to only rows where BMI Category is Normal and Sleep Disorder is Insomnia.
normal = sleep_df[(sleep_df["BMI Category"] == "Normal")  &   (sleep_df["Sleep Disorder"] == "Insomnia")]
# Total normal rows               
total_normal = len(sleep_df[sleep_df["BMI Category"] == "Normal"])  
# Calculate normal insomnia ratio               
normal_insomnia_ratio = round(len(normal) / total_normal, 2) 

print(f"Among users with a Normal BMI, {normal_insomnia_ratio * 100}% have been diagnosed with Insomnia.")



# Filter to only rows where BMI Category is Overweight and Sleep Disorder is Insomnia.
overweight = sleep_df[(sleep_df["BMI Category"] == "Overweight")  &  (sleep_df["Sleep Disorder"] == "Insomnia")]  
# Total overweight rows
total_overweight = len(sleep_df[sleep_df["BMI Category"] == "Overweight"])  
# Calculate overweight insomnia ratio 
overweight_insomnia_ratio = round(len(overweight) / total_overweight, 2)

print(f"Among users with an overweight BMI, {overweight_insomnia_ratio * 100}% have been diagnosed with Insomnia.")


# Filter to only rows where BMI Category is Obese and Sleep Disorder is Insomnia.
obese = sleep_df[(sleep_df["BMI Category"] == "Obese")  &  (sleep_df["Sleep Disorder"] == "Insomnia")]
# Total obese rows          
total_obese = len(sleep_df[sleep_df["BMI Category"] == "Obese"])  
# Calculate obese insomnia ratio
obese_insomnia_ratio = round(len(obese) / total_obese, 2)

print(f"Among users with an Obese BMI, {obese_insomnia_ratio * 100}% have been diagnosed with Insomnia.")
Among users with a Normal BMI, 4.0% have been diagnosed with Insomnia.
Among users with an overweight BMI, 43.0% have been diagnosed with Insomnia.
Among users with an Obese BMI, 40.0% have been diagnosed with Insomnia.
In [16]:
# Create dictionary to store the ratios for each BMI category 
bmi_insomnia_ratios = {
    "Normal": normal_insomnia_ratio,  
    "Overweight": overweight_insomnia_ratio,
    "Obese": obese_insomnia_ratio 
}

bmi_insomnia_ratios
Out[16]:
{'Normal': 0.04, 'Overweight': 0.43, 'Obese': 0.4}

Final Thoughts¶

The occupation Sales Representative has the lowest sleep time as well as lowest sleep quality. Moreover, There is a huge difference between BMI category Normal and other two with insomia diagnosis. The data can be analyzed more to answer as many questions as we want. Thank you!¶
In [ ]:
# Read in the data 
sleep_df = pd.read_csv('sleep_health_data.csv')

# 1. Which occupation has the lowest average sleep duration? Save this in a string variable called `lowest_sleep_occ`.

# Groupby occupation and calculate mean sleep duration 
sleep_duration = sleep_df.groupby('Occupation')['Sleep Duration'].mean()
# Get occupation with lowest average sleep duration
lowest_sleep_occ = sleep_duration.sort_values().index[0]

# 2. Which occupation had the lowest quality of on average? Did the occupation with the lowest sleep duration also have the worst sleep quality?

# Groupby occupation and calculate average sleep quality
sleep_quality = sleep_df.groupby('Occupation')['Quality of Sleep'].mean()  
# Get occupation with lowest average sleep quality 
lowest_sleep_quality_occ = sleep_quality.sort_values().index[0]

# Compare occupation with the least sleep to occupation with the lowest sleep quality
if lowest_sleep_occ == lowest_sleep_quality_occ:
  same_occ = True
else:
  same_occ = False
  
# 3. Let's explore how BMI Category can affect sleep disorder rates. Start by finding what ratio of app users in each BMI category have been diagnosed with Insomnia.

# Normal
# Filter the full dataframe to only rows where BMI Category is Normal and Sleep Disorder is Insomnia.
normal = sleep_df[(sleep_df["BMI Category"] == "Normal") &  
                  (sleep_df["Sleep Disorder"] == "Insomnia")]
normal2 = sleep_df[(sleep_df["BMI Category"] == "Normal Weight") &  
                  (sleep_df["Sleep Disorder"] == "Insomnia")]
# Total normal rows               
total_normal = len(sleep_df[sleep_df["BMI Category"] == "Normal"])  
# Calculate normal insomnia ratio               
normal_insomnia_ratio = round(len(normal) / total_normal, 2) 


# Overweight
# Filter the full dataframe to only rows where BMI Category is Overweight and Sleep Disorder is Insomnia.
overweight = sleep_df[(sleep_df["BMI Category"] == "Overweight") &   
                      (sleep_df["Sleep Disorder"] == "Insomnia")]  
# Total overweight rows
total_overweight = len(sleep_df[sleep_df["BMI Category"] == "Overweight"])  
# Calculate overweight insomnia ratio 
overweight_insomnia_ratio = round(len(overweight) / total_overweight, 2)


# Obese
# Filter the full dataframe to only rows where BMI Category is Obese and Sleep Disorder is Insomnia.
obese = sleep_df[(sleep_df["BMI Category"] == "Obese") &  
                  (sleep_df["Sleep Disorder"] == "Insomnia")]
# Total obese rows          
total_obese = len(sleep_df[sleep_df["BMI Category"] == "Obese"])  
# Calculate obese insomnia ratio
obese_insomnia_ratio = round(len(obese) / total_obese, 2)


# Create dictionary to store the ratios for each BMI category 
bmi_insomnia_ratios = {
    "Normal": normal_insomnia_ratio,  
    "Overweight": overweight_insomnia_ratio,
    "Obese": obese_insomnia_ratio 
}